36 research outputs found

    Humans and deep networks largely agree on which kinds of variation make object recognition harder

    Get PDF
    View-invariant object recognition is a challenging problem, which has attracted much attention among the psychology, neuroscience, and computer vision communities. Humans are notoriously good at it, even if some variations are presumably more difficult to handle than others (e.g. 3D rotations). Humans are thought to solve the problem through hierarchical processing along the ventral stream, which progressively extracts more and more invariant visual features. This feed-forward architecture has inspired a new generation of bio-inspired computer vision systems called deep convolutional neural networks (DCNN), which are currently the best algorithms for object recognition in natural images. Here, for the first time, we systematically compared human feed-forward vision and DCNNs at view-invariant object recognition using the same images and controlling for both the kinds of transformation as well as their magnitude. We used four object categories and images were rendered from 3D computer models. In total, 89 human subjects participated in 10 experiments in which they had to discriminate between two or four categories after rapid presentation with backward masking. We also tested two recent DCNNs on the same tasks. We found that humans and DCNNs largely agreed on the relative difficulties of each kind of variation: rotation in depth is by far the hardest transformation to handle, followed by scale, then rotation in plane, and finally position. This suggests that humans recognize objects mainly through 2D template matching, rather than by constructing 3D object models, and that DCNNs are not too unreasonable models of human feed-forward vision. Also, our results show that the variation levels in rotation in depth and scale strongly modulate both humans' and DCNNs' recognition performances. We thus argue that these variations should be controlled in the image datasets used in vision research

    Modelling human choices: MADeM and decision‑making

    Get PDF
    Research supported by FAPESP 2015/50122-0 and DFG-GRTK 1740/2. RP and AR are also part of the Research, Innovation and Dissemination Center for Neuromathematics FAPESP grant (2013/07699-0). RP is supported by a FAPESP scholarship (2013/25667-8). ACR is partially supported by a CNPq fellowship (grant 306251/2014-0)

    Deep learning in spiking neural networks

    No full text
    International audienceIn recent years, deep learning has revolutionized the field of machine learning, for computer vision in particular. In this approach, a deep (multilayer) artificial neural network (ANN) is trained, most often in a supervised manner using backpropagation. Vast amounts of labeled training examples are required, but the resulting classification accuracy is truly impressive, sometimes outperforming humans.Neurons in an ANN are characterized by a single, static, continuous-valued activation. Yet biological neurons use discrete spikes to compute and transmit information, and the spike times, in addition to the spike rates, matter. Spiking neural networks (SNNs) are thus more biologically realistic than ANNs, and are arguably the only viable option if one wants to understand how the brain computes at the neuronal description level. The spikes of biological neurons are sparse in time and space, and event-driven. Combined with bio-plausible local learning rules, this makes it easier to build low-power, neuromorphic hardware for SNNs. However, training deep SNNs remains a challenge. Spiking neurons’ transfer function is usually non-differentiable, which prevents using backpropagation.Here we review recent supervised and unsupervised methods to train deep SNNs, and compare them in terms of accuracy and computational cost. The emerging picture is that SNNs still lag behind ANNs in terms of accuracy, but the gap is decreasing, and can even vanish on some tasks, while SNNs typically require many fewer operations and are the better candidates to process spatio-temporal data

    Spiking Neural Networks Trained via Proxy

    No full text
    International audienc

    An evidence-based combining classifier for brain signal analysis.

    Get PDF
    Nowadays, brain signals are employed in various scientific and practical fields such as Medical Science, Cognitive Science, Neuroscience, and Brain Computer Interfaces. Hence, the need for robust signal analysis methods with adequate accuracy and generalizability is inevitable. The brain signal analysis is faced with complex challenges including small sample size, high dimensionality and noisy signals. Moreover, because of the non-stationarity of brain signals and the impacts of mental states on brain function, the brain signals are associated with an inherent uncertainty. In this paper, an evidence-based combining classifiers method is proposed for brain signal analysis. This method exploits the power of combining classifiers for solving complex problems and the ability of evidence theory to model as well as to reduce the existing uncertainty. The proposed method models the uncertainty in the labels of training samples in each feature space by assigning soft and crisp labels to them. Then, some classifiers are employed to approximate the belief function corresponding to each feature space. By combining the evidence raised from each classifier through the evidence theory, more confident decisions about testing samples can be made. The obtained results by the proposed method compared to some other evidence-based and fixed rule combining methods on artificial and real datasets exhibit the ability of the proposed method in dealing with complex and uncertain classification problems

    BS4NN: Binarized Spiking Neural Networks with Temporal Coding and Learning

    No full text
    International audienceWe recently proposed the S4NN algorithm, essentially an adaptation of backpropagation to multilayer spiking neural networks that use simple nonleaky integrate-and-fire neurons and a form of temporal coding known as time-to-first-spike coding. With this coding scheme, neurons fire at most once per stimulus, but the firing order carries information. Here, we introduce BS4NN, a modification of S4NN in which the synaptic weights are constrained to be binary (+1 or -1), in order to decrease memory and computation footprints. This was done using two sets of weights: firstly, real-valued weights, updated by gradient descent, and used in the backward pass of backpropagation, and secondly, their signs, used in the forward pass. Similar strategies have been used to train (non-spiking) binarized neural networks. The main difference is that BS4NN operates in the time domain: spikes are propagated sequentially, and different neurons may reach their threshold at different times, which increases computational power. We validated BS4NN on two popular benchmarks, MNIST and Fashion MNIST, and obtained state-of-the-art accuracies for this sort of networks (97.0% and 87.3% respectively) with a negligible accuracy drop with re-spect to real-valued weights (0.4% and 0.7%, respectively). We also demonstrated that BS4NNoutperforms a simple BNN with the same architectures on those two datasets (by 0.2% and 0.9% respectively), presumably because it leverages the temporal dimension

    Humans and deep networks largely agree on which kinds of variation make object recognition harder

    No full text
    View-invariant object recognition is a challenging problem that has attracted much attention among the psychology, neuroscience, and computer vision communities. Humans are notoriously good at it, even if some variations are presumably more difficult to handle than others (e.g. 3D rotations). Humans are thought to solve the problem through hierarchical processing along the ventral stream, which progressively extracts more and more invariant visual features. This feed-forward architecture has inspired a new generation of bio-inspired computer vision systems called deep convolutional neural networks (DCNN), which are currently the best models for object recognition in natural images. Here, for the first time, we systematically compared human feed-forward vision and DCNNs at view-invariant object recognition task using the same set of images and controlling the kinds of transformation (position, scale, rotation in plane, and rotation in depth) as well as their magnitude, which we call variation level. We used four object categories: car, ship, motorcycle, and animal. In total, 89 human subjects participated in 10 experiments in which they had to discriminate between two or four categories after rapid presentation with backward masking. We also tested two recent DCNNs (proposed respectively by Hinton's group and Zisserman's group) on the same tasks. We found that humans and DCNNs largely agreed on the relative difficulties of each kind of variation: rotation in depth is by far the hardest transformation to handle, followed by scale, then rotation in plane, and finally position (much easier). This suggests that DCNNs would be reasonable models of human feed-forward vision. In addition, our results show that the variation levels in rotation in depth and scale strongly modulate both humans' and DCNNs' recognition performances. We thus argue that these variations should be controlled in the image datasets used in vision research
    corecore